Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JIT] Add legacy extended EVEX encoding and EVEX.ND/NF feature to x64 emitter backend #108796

Open
wants to merge 84 commits into
base: main
Choose a base branch
from

Conversation

Ruihan-Yin
Copy link
Contributor

@Ruihan-Yin Ruihan-Yin commented Oct 11, 2024

Overview

This PR is built based on #106557, and is the first one that covers APX-EXTENDED-EVEX encoding.

This PR adds extended EVEX encoding for legacy instructions that are promoted to the EVEX encoding space, currently only instructions wit the new data destination (EVEX.NDD) feature, are covered in the PR. We plan to cover the encoding and instructions for flag suppression (EVEX.NF) in follow-up PRs.

EVEX.ND covered instructions:

INC, DEC, NOT, NEG, 

ADD, SUB, AND, OR, XOR, 

SAL, SAR, SHL, SHR, RCL, RCR, ROL, ROR, 

CMOVcc, IMUL(0xAF).

EVEX.NF cover instructions:

 INC, DEC, NEG, ADD, SUB, AND, OR, XOR, SAL, IMUL, IDIV, MUL, DIV,

ROL, ROR, RCL, RCR, SHL, SHR, SAR

TZCNT, LZCNT, POPCNT,

ANDN, BEXTR, BLSI, BLSMSK, BLSR.

Specification

EVEX extension of legacy instructions is one of the changes made on the original EVEX prefix to accommodate the ISA features and new instructions introduced by APX, and this part of extension focuses on promoting legacy instructions into EVEX encoding space and providing them with features like EGPR access, new data destination, zero upper, flag suppression.

image

As shown in the figure, some bits in original EVEX prefix have been re-purposed: EVEX.b to EVEX.ND, first bit of EVEX.aaa to EVEX.NF, and some bits have become reserved and has to be 0. Also, the promoted legacy instructions take a new legacy-map-index: map-4, as shown at EVEX.bits[18:16], say EVEX.mmm field, to be 100b.

All the promoted legacy instructions should follow this encoding schema, and for instructions that does not use these REX bits for access upper registers, these bits: EVEX.R4, X4, B4, R3, X3, B3 should be kept in logical-0 (0, or 1 if defined in inverted way.).

Design

As stated above, this PR will cover the encoding changes needed for EVEX extension for legacy instructions and support for EVEX.ND.

The bulk of the changes occur in the backend emitter, and some changes are added to code generation as the entry of optimization of NDD format.

One part I need to call out in the design is that we separated the EVEX encoding path for legacy instructions with the original EVEX path, and the new emit path will be guarded by TakesApxExtendedEvexPrefix. The main reason for this is that the legacy extension part for APX-EVEX will break the assumption that EVEX is only for SIMD instructions and will only be appear on SIMD instruction emit paths, which JIT carries a lot of assertion check to verify. To let the original checks hold as much as possible, we finally chose to establish a stand-alone branch for extended legacy instructions on the path that does not have legacy encoding, or re-use the existing legacy encoding path with some prefix work.

Optimization & Performance

In the asmdiff part below, code size regression was observed, say the use of EVEX.ND feature will increase the code size, in detail, the NDD form will introduce at most 2-byte regression per instruction, this is expected as we are using a 4-byte prefixed instruction to replace 2 legacy instructions which are normally 2 bytes. This creates the tradeoff between code size and instructions count, and we will be contributing to teach JIT how to wisely use this feature to get maximum performance gain while controlling the code size regression with a series of followed tuning works.

For better tuning the features, we added the optimization knob for NDD: JitEnableAPXNDD, now NDD optimization is there for a few binary and unary instructions when the target register is different from src operands, but to use this feature more wisely, we will need more tunning work in the future, so we plan to have individual tunning knob for each feature APX provides, like NDD, NF, etc.

Testing

Note: The testing plan for APX work has been discussed in #106557, please refer to that PR for details, only results and comments will be posted in this PR.

Results separately posted below.

Follow-up plans

After this PR, we will continue to complete the APX-EVEX support for EVEX.NF for legacy/VEX instructions, and further APX-EVEX support for VEX/EVEX instructions.

Edit:
We eventually decided to cover the EVEX.NF feature within this PR as well. This feature will be enabled with encoding only, and there will be no active surface for this feature until we have some related codegen works.

In summary, this PR covers all the changes to enable EVEX.ND/NF feature, plus the needed register encoding, while this PR is not intended for full coverage for this part.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 11, 2024
@Ruihan-Yin
Copy link
Contributor Author

1. Emitter unit tests

JIT disassembler outputs:
image

LLVM disassembler outputs:
image

No disassemble diff observed:
image

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Oct 11, 2024
@Ruihan-Yin
Copy link
Contributor Author

2. SuperPMI

Verification with SuperPMI:

asmdiffs:
Diffs are based on 2,830,588 contexts (1,185,269 MinOpts, 1,645,319 FullOpts).

MISSED contexts: base: 0 (0.00%), diff: 11 (0.00%)

Diff JIT options: JitBypassAPXCheck=1

Overall (+330,453 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 49,406,065 +17,322 -0.62%
benchmarks.run.windows.x64.checked.mch 12,230,572 +12,326 -0.78%
benchmarks.run_pgo.windows.x64.checked.mch 40,192,955 +34,649 -0.73%
benchmarks.run_tiered.windows.x64.checked.mch 17,606,620 +8,076 -0.87%
coreclr_tests.run.windows.x64.checked.mch 409,086,766 +40,588 -0.42%
libraries.crossgen2.windows.x64.checked.mch 45,250,222 +15,319 -1.03%
libraries.pmi.windows.x64.checked.mch 63,022,393 +26,233 -1.10%
libraries_tests.run.windows.x64.Release.mch 336,307,360 +113,846 -0.62%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 147,986,092 +48,675 -0.47%
realworld.run.windows.x64.checked.mch 11,552,911 +5,010 -0.76%
smoke_tests.nativeaot.windows.x64.checked.mch 5,023,568 +8,409 -1.45%
MinOpts (+17,921 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 23,379,337 +552 -0.38%
benchmarks.run.windows.x64.checked.mch 588 +2 -0.43%
benchmarks.run_pgo.windows.x64.checked.mch 18,796,230 +1,116 -0.46%
benchmarks.run_tiered.windows.x64.checked.mch 13,707,415 +945 -0.45%
coreclr_tests.run.windows.x64.checked.mch 287,081,075 +8,891 -0.44%
libraries.crossgen2.windows.x64.checked.mch 1,705 +2 -0.43%
libraries.pmi.windows.x64.checked.mch 112,961 +2 -0.43%
libraries_tests.run.windows.x64.Release.mch 203,705,533 +5,515 -0.42%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 10,696,900 +896 -0.12%
FullOpts (+312,532 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 26,026,728 +16,770 -0.63%
benchmarks.run.windows.x64.checked.mch 12,229,984 +12,324 -0.78%
benchmarks.run_pgo.windows.x64.checked.mch 21,396,725 +33,533 -0.74%
benchmarks.run_tiered.windows.x64.checked.mch 3,899,205 +7,131 -0.98%
coreclr_tests.run.windows.x64.checked.mch 122,005,691 +31,697 -0.42%
libraries.crossgen2.windows.x64.checked.mch 45,248,517 +15,317 -1.03%
libraries.pmi.windows.x64.checked.mch 62,909,432 +26,231 -1.10%
libraries_tests.run.windows.x64.Release.mch 132,601,827 +108,331 -0.64%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 137,289,192 +47,779 -0.48%
realworld.run.windows.x64.checked.mch 11,139,943 +5,010 -0.76%
smoke_tests.nativeaot.windows.x64.checked.mch 5,022,597 +8,409 -1.45%

tpdiff:

Diff JIT options: JitBypassAPXCheck=1

Overall (+0.27% to +0.60%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.41%
benchmarks.run.windows.x64.checked.mch +0.27%
benchmarks.run_pgo.windows.x64.checked.mch +0.35%
benchmarks.run_tiered.windows.x64.checked.mch +0.60%
coreclr_tests.run.windows.x64.checked.mch +0.53%
libraries.crossgen2.windows.x64.checked.mch +0.38%
libraries.pmi.windows.x64.checked.mch +0.30%
libraries_tests.run.windows.x64.Release.mch +0.46%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.32%
realworld.run.windows.x64.checked.mch +0.29%
smoke_tests.nativeaot.windows.x64.checked.mch +0.27%
MinOpts (+0.82% to +1.08%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +1.08%
benchmarks.run.windows.x64.checked.mch +0.92%
benchmarks.run_pgo.windows.x64.checked.mch +1.04%
benchmarks.run_tiered.windows.x64.checked.mch +1.05%
coreclr_tests.run.windows.x64.checked.mch +0.82%
libraries.crossgen2.windows.x64.checked.mch +1.02%
libraries.pmi.windows.x64.checked.mch +0.82%
libraries_tests.run.windows.x64.Release.mch +1.07%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.90%
realworld.run.windows.x64.checked.mch +1.02%
smoke_tests.nativeaot.windows.x64.checked.mch +0.95%
FullOpts (+0.22% to +0.38%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch +0.27%
benchmarks.run.windows.x64.checked.mch +0.27%
benchmarks.run_pgo.windows.x64.checked.mch +0.22%
benchmarks.run_tiered.windows.x64.checked.mch +0.24%
coreclr_tests.run.windows.x64.checked.mch +0.31%
libraries.crossgen2.windows.x64.checked.mch +0.38%
libraries.pmi.windows.x64.checked.mch +0.30%
libraries_tests.run.windows.x64.Release.mch +0.26%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.31%
realworld.run.windows.x64.checked.mch +0.28%
smoke_tests.nativeaot.windows.x64.checked.mch +0.27%

Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@Ruihan-Yin
Copy link
Contributor Author

3. JIT unit tests

DOTNET_JitBypassAPXCheck = 0:
image

DOTNET_JitBypassAPXCheck = 1:
image

@Ruihan-Yin
Copy link
Contributor Author

4. Supplement files:

To see detail diffs, please refer to the following files: (files are too large to display on github)

asm:
asmdiff_summary.md
(~100+ assertion failures related ISA checks, expected due to the CPUID updates).
asm.log

tpdiff:
tpdiff_summary.md
(~500+ assertion failures related to the pipeline itself - map-key missing, should not be related to the changes.)
tpdiff.log

Update comments.

Merge the REX2 changes into the original legacy emit path

bug fix: Set REX2.W with correct mask code.

register encoding and prefix emitting logics.

Add REX2 prefix emit logic

bug fixes

Add Stress mode for REX2 encoding and some bug fixes

resolve comments:
1. add assertion check for UD opcodes.
2. add checks for EGPRs.

Add REX2 to emitOutputAM, and let LEA to be REX2 compatible.

Add REX2.X encoding for SIB byte

But fixes: add REX2 prefix on the path in RI where MOV is specially handled.

Enable REX2 encoding for `movups`

fixed bugs in REX2 prefix emitting logic when working with map 1 instructions, and enabled REX2 for POPCNT

legacy map index-er

bug fixes

some clean-up

Adding initial APX unit testing path.

Adding a coredistools dll that has LLVM APX disasm capability.

It must be coppied into a CORE_ROOT manually.

clean up work for REX2

narrow the REX2 scope to `sub` only

some clean up based on the comments.

bug fix

resolve comment
 - SV path is mostly for debugging purposes

Added encoding unit tests for instructions with immediates
Code refactoring: AddX86PrefixIfNeeded.
… missing in JIT, may indicate these instructions are not being used in JIT, drop them for now.
@Ruihan-Yin Ruihan-Yin changed the title [JIT] Add legacy extended EVEX encoding and EVEX.ND feature to x64 emitter backend [JIT] Add legacy extended EVEX encoding and EVEX.ND/NF feature to x64 emitter backend Dec 13, 2024
@Ruihan-Yin
Copy link
Contributor Author

Updated the branch to include EVEX.NF changes within this PR, I would rebase the branch after #106557 is merged for a cleaner cod diffs.

@Ruihan-Yin
Copy link
Contributor Author

This PR now is ready for review

…ob and will not be EVEX-encoded when JitStressEvexEncoding is set.
Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. I had a few questions and suggestions.

Comment on lines 784 to 785
instruction ins = genGetInsForOper(tree->OperGet(), targetType);
inst_RV(ins, targetReg, targetType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
instruction ins = genGetInsForOper(tree->OperGet(), targetType);
inst_RV(ins, targetReg, targetType);
inst_RV(ins, targetReg, targetType);

ins is already defined?


inst_Mov(targetType, targetReg, operandReg, /* canSkip */ true);
if (JitConfig.JitEnableApxNDD() && GetEmitter()->IsApxNDDEncodableInstruction(ins) && (targetReg != operandReg))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean that we will always enable with EVEX NDD if we are allowed to? Are there other tuning parameters that would play into this decision, like whether the registers are low/high (so the mov requires a prefix or not, and how large a prefix it requires)?

Will JitConfig.JitEnableApxNDD() && GetEmitter()->IsApxNDDEncodableInstruction(ins) be a common pattern? (that warrants a helper function)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Bruce, thanks for the feedback!

As for now, we are focusing on the feature enabling in these PRs, so here what we did is to introduce the features and hide them behind a feature flag with default to disabled, and for performance-turning, it will be our future target, and is expected to come with separate PRs.

Will JitConfig.JitEnableApxNDD() && GetEmitter()->IsApxNDDEncodableInstruction(ins) be a common pattern? (that warrants a helper function)

Yes, I will work on this to combine them into a helper.

{
// 3-byte
return true;
}

if ((code & 0xFF00FF00) == 0x0F000000)
if ((code & 0xFF000000) == 0x0F000000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're missing some changes in this function that were made in #106557 (comment) (bad merge?)

if (id->idIsEvexNfContextSet() && IsBMIInstruction(ins))
{
// Only a few BMI instructions shall be promoted to APX-EVEX due to NF feature.
// TODO-Ruihan: convert the check into forms like Has* as above.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO-Ruihan: convert the check into forms like Has* as above.
// TODO-APX: convert the check into forms like Has* as above.

we don't put names in the comments

@@ -1381,7 +1515,7 @@ bool emitter::TakesRex2Prefix(const instrDesc* id) const
// TODO-xarch-apx:
// At this stage, we are only using REX2 in the case that non-simd integer instructions
// with EGPRs being used in its operands, it could be either direct register uses, or
// memory addressing operands, i.e. index and base.
// memory addresssig operands, i.e. index and base.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bad merge? you are un-doing a typo fix

{
// We can only have one memory operand and only src can be a constant operand
// However, the handling for a given operand type (mem, cns, or other) is fairly
// consistent regardless of whether they are src or dst. As such, we will find
// the type of each operand and only check them against src/dst where relevant.

bool useNDD = UsePromotedEVEXEncoding() && (targetReg != REG_NA);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
bool useNDD = UsePromotedEVEXEncoding() && (targetReg != REG_NA);
const bool useNDD = UsePromotedEVEXEncoding() && (targetReg != REG_NA);

@@ -15515,15 +16041,22 @@ BYTE* emitter::emitOutputRR(BYTE* dst, instrDesc* id)
#endif // FEATURE_HW_INTRINSICS
else
{
// TODO-XArch-APX:
// Ruihan:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove name

{
assert(IsApxExtendedEvexInstruction(ins));
assert(id->idInsFmt() == IF_RWR_RRD_RRD);
switch (id->idIns())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like copy-and-paste code from emitOutputRR. Can we extract a function to handle both cases?

@@ -17801,6 +18627,23 @@ size_t emitter::emitOutputInstr(insGroup* ig, instrDesc* id, BYTE** dp)
{
code = insCodeRM(ins);

if (id->idIsEvexNdContextSet() && TakesApxExtendedEvexPrefix(id))
{
// TODO-XArchh-apx:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO-XArchh-apx:
// TODO-XArch-apx:

if (id->idIsEvexNdContextSet() && TakesApxExtendedEvexPrefix(id))
{
// TODO-XArchh-apx:
// Ruihan: I'm not sure why instructions on this path can be with instruction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove name

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apx Related to the Intel Advanced Performance Extensions (APX) area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants